Markov Chain Monte Carlo I

Chelsea Parlett-Pelleriti

Monte Carlo Methods

\(\pi\) Monte Carlo

\[ \text{area of square} = (2r)^2 = 4r^2 \\ \text{area of circle} = \pi r^2 \\ \frac{\text{area of circle}}{ \text{area of square}} = \frac{\pi}{4} \]

\(\pi\) Monte Carlo

\(\pi\) Monte Carlo

\(\pi\) Monte Carlo

\(\pi\) Monte Carlo

\(\pi\) Monte Carlo

# simulate data
set.seed(540)
x <- runif(1000,0,1)
y <- runif(1000,0,1)
df <- data.frame(x,y)

# check if dots are inside circle
df$in_circle <- sqrt((df$x - 0.5)^2 +
                       (df$y - 0.5)^2) <= 0.5
print(paste("sim   : ", mean(df$in_circle)))
[1] "sim   :  0.777"
print(paste0("actual: ", round(pi/4,3)))
[1] "actual: 0.785"

\(\pi\) Monte Carlo

Two-in-a-row

I roll a die, only stop if I get same number twice in a row.

❓ What is \(\mathbb{E}(\text{# of die rolls})\)?

Two-in-a-row

  • \(S_0\): haven’t rolled yet
  • \(S_1\): we haven’t gotten 2-in-a-row yet, but we’ve rolled
  • \(S_2\): last two rolls are the same! win!

\[ \mathbb{E}_0 = \underbrace{1}_\text{first roll} + \mathbb{E}_1 \]

\[ \mathbb{E}_1 = \underbrace{\frac{1}{6}(1)}_\text{matched last roll} + \underbrace{\frac{5}{6} (1 + \mathbb{E}_1)}_\text{didn't} \\ \mathbb{E}_1 - \frac{5}{6}\mathbb{E}_1 = \frac{1}{6} + \frac{5}{6} \\ \frac{1}{6}\mathbb{E}_1 = 1 \\ \mathbb{E}_1 = 6 \\ \mathbb{E}_0 = 1 + \mathbb{E}_1 = 7 \]

Two-in-a-row

n_sim <- 1000
sim_game <- function(sides = 6){
  won <- FALSE
  last_roll <- 999
  n_rolls <- 0
  while(!won){
    roll <- sample(1:6, size = 1) # die roll
    n_rolls <- n_rolls + 1
    if (roll == last_roll){
      return(n_rolls)
    } else {
      last_roll <- roll
    }
  }
}
out <- sapply(1:n_sim, function(x) sim_game())
mean(out)
[1] 7.27

Advantages/Disadvantages

Advantages

  • “Easy”

  • Sometimes Fast

Disadvantages

  • Not always fast

  • Not always exact

Markov Chains

Markov Process Definition

Graph of states + transitions.

Markov Property: Future state depends only on current state (efficient…but can cause problems)

\[ P(X_n = x | X_1 = x_1, X_2 = x_2, ..., X_{n-1}= x_{n-1}) = P(X_n = x | X_{n-1}= x_{n-1}) \]

Markov Transition Matrices

\[ A = \begin{bmatrix} 0.2 & 0.3 & 0.5 \\ 0.6 & 0 & 0 \\ 0.2 & 0.7 & 0.5 \end{bmatrix} \]

A steady state or stationary distribution occurs when \(xA = x\)

Markov Transition Matrices

\[ A = \begin{bmatrix} 0.2 & 0.3 & 0.5 \\ 0.6 & 0 & 0 \\ 0.2 & 0.7 & 0.5 \end{bmatrix} \]

# burger, pizza, hotdog
tr <- matrix(c(0.2, 0.6, 0.2,
               0.3,0,0.7,
               0.5,0,0.5), ncol = 3)
eigen(tr)$vectors[,1]/sum(eigen(tr)$vectors[,1])
[1] 0.3521127+0i 0.2112676+0i 0.4366197+0i

Markov Transition Matrices

Design Balance Condition

\[ p(x)T(y|x) = p(y)T(x|y) \,\,\, \forall_{x,y} \Rightarrow \\ \sum_{x}p(x)T(y|x) = p(y)\sum_{x}T(y|x) = p(y) \Rightarrow \\ pT = p \]

Markov Chain Properties

  • Markov Property: state only relies on last state
  • \(\sum_{i=0}^n p_i = 1\)

Markov Seuss

Markov Chain Monte Carlo

Accept/Reject Sampling

Problem: sample from \(f(x)\), but don’t have PDF/CDF, only know it’s shape \(p(x)\)

\[ p(x) \propto f(x) \]

Solution: sample from \(g(x)\) and only accept samples according to how likely they are in \(p(x)\)

Accept/Reject Sampling

  • sample from \(g(x)\)
  • accept with prob \(\frac{f(x)}{M*g(x)}\)

Accept/Reject Sampling

Proof:

\[ D(x | A) = \frac{P(A|x)D(x)}{P(A)} = \frac{\frac{f(x)}{M*g(x)} g(x)}{P(A)}\\ P(A) = \int g(x) \frac{f(x)}{M*g(x)} dx = \frac{1}{M} \int f(x)dx = \frac{NC}{M} \\ D(x | A) = \frac{f(x)/M}{NC/M} = \frac{f(x)}{NC} = p(x) \]

Accept/Reject Sampling

Cons: samples are uncorrelated, we throw away information about a good value we find, can be inefficient. Hard to choose \(g(x)\)

Enter MCMC! 💡 Next sample depends on previous sample.

Markov Chain Monte Carlo

Markov Chain: we’re using a markov process to generate proposed samples

Monte Carlo: we’re drawing random samples to estimate a distribution

Markov Chain Monte Carlo

We want a markov chain whose steady state is the distribution \(f(x)\).


❓ Do MPs converge to their steady state right away?

Markov Chain Monte Carlo

Goal: find a markov chain whose steady state (after a burn-in period) is \(p(x)\)

King Markov

👑 King Markov wants to visit his 5 islands with populations 100, 200, 300, 400, 500

Metropolis Islands

King Markov

Requirements:

  • visit islands proportional to population

  • no tracking islands/populations over time

  • randomness/spontaneity of visits

❓ How

King Markov

Proposal:

  1. Randomly choose an island using \(J \left (\color{blue}{\text{island}_n} | \color{red}{\text{island}_{n-1}}\right )\)
  2. Send scout to \(\color{blue}{\text{island}_n}\)
  3. Ask for \(\color{blue}{\text{island}_n}\) population
  4. Go to \(\color{blue}{\text{island}_n}\) with probability \(A\), stay at \(\color{red}{\text{island}_{n-1}}\) with probability \(1-A\)

💡 How would you choose \(J\) and \(A\)?

King Markov

  • \(J\): any random process where \(J(a | b) = J(b | a)\)

  • \(A\): \(min \left (1,\frac{pop(\color{blue}{\text{island}_n})}{pop(\color{red}{\text{island}_{n-1}})} \right )\)

King Markov

markov_step <- function(island){
  islands <- LETTERS[1:5]
  islands_pop <- list(A = 100,
                      B = 200,
                      C = 300,
                      D = 400,
                      E = 500)
  
  proposed_i <- sample(islands[islands != island],
                       size = 1)
  
  accept_prob <- min(1,
                islands_pop[[proposed_i]]/islands_pop[[island]])
  
  u <- runif(1)
  accepted <-  u <= accept_prob
  value <- ifelse(accepted,
                  proposed_i,
                  island)
  
  value
  
}

markov_step("A")
[1] "C"

King Markov

n_sim <- 10000
isl = "A"
track <- c(isl)

for (i in 1:n_sim){
  isl <- markov_step(isl)
  track <- c(track, isl)
}

ggplot(data = data.frame(track),
       aes(x = track, fill = track)) + 
  geom_bar(color = "black") + 
  theme_minimal() + 
  theme(panel.grid = element_blank(),
        legend.position = "none") +
  labs(x = "Island",
       y = "Nights Spent")

King Markov

King Markov

Vocabulary:

  • chain: sequence of draws from a distribution

  • markov chain: the next state in the chain depends only on previous state

  • draw: one sample/state from the chain

Metropolis-Hastings

\[ A(x_n \rightarrow x_*) = min \left (1, \frac{f(x_*)}{f(x_n)} \times \frac{q(x_n | x_*)}{q(x_* | x_n)} \right ) \]

\(\frac{q(x_n | x_*)}{q(x_* | x_n)}\) corrects for bias in which values the proposal distribution will generate.

when \(q(x_n | x_*) >q(x_* | x_n)\) the correction makes us more likely to go to \(x_*\)

Metropolis-Hastings

\[ A(x_n \rightarrow x_*) = min \left (1, \frac{f(x_*)}{f(x_n)} \times \frac{q(x_n | x_*)}{q(x_* | x_n)} \right ) \]

\(\frac{q(x_n | x_*)}{q(x_* | x_n)}\) corrects for bias in which values the proposal distribution will generate.

When \(q(x_n | x_*) = q(x_* | x_n)\) the correction goes away, leaving us with King Markov’s method (Metropolis Algorithm).

Metropolis-Hastings

\(q(x_n | x_*) = q(x_* | x_n)\)

Metropolis-Hastings

\(q(x_n | x_*) \neq q(x_* | x_n)\)

Metropolis-Hastings

metropolis_step <- function(x, sigma) {
  # generate proposal
1  proposed_x <- rnorm(1, mean = x, sd = sigma)
  
  # calculate A
2  accept_prob <- min(1, target(proposed_x) / target(x))
  
  # accept/reject
3  accepted <- runif(1) <= accept_prob
  value <- ifelse(accepted,
                  proposed_x,
                  x)
  
  value
}
1
Generate proposed value using a N(x,sigma)
2
Calculate the probability of accepting proposal
3
Accept or Reject proposal

code adapted from: Navarro, Danielle. 2023. “The Metropolis-Hastings Algorithm.” April 12, 2023. https://blog.djnavarro.net/posts/2023-04-12_metropolis-hastings/.

Peek at Future

Tons of new MCMC algorithms. But the cool ones use gradients to choose samples.

Many SOTA use Hamiltonian Monte Carlo